我们考虑了与视图合成的重大视点变化下的两视图匹配的问题。我们提出了两种新颖的方法,将视图合成开销最小化。第一个名为denseaffnet,使用了affnet的密集仿射形状估计值,它允许其划分图像,仅使用单个仿射图对每个分区进行整流。第二个名为Depthaffnet,结合了深度图和仿射形状估算的信息,以生成不同图像分区的不同整体构图仿射图。Denseaffnet比最先进的速度快,并且在通用场景上更准确。Depthaffnet在包含大平面的场景上与最先进的状态相提并论。评估是在3个公共数据集上执行的-EVD数据集,强烈的观点更改数据集和IMC光仪数据集。
translated by 谷歌翻译
We present DeblurGAN, an end-to-end learned method for motion deblurring. The learning is based on a conditional GAN and the content loss . DeblurGAN achieves state-of-the art performance both in the structural similarity measure and visual appearance. The quality of the deblurring model is also evaluated in a novel way on a real-world problem -object detection on (de-)blurred images. The method is 5 times faster than the closest competitor -Deep-Deblur [25]. We also introduce a novel method for generating synthetic motion blurred images from sharp ones, allowing realistic dataset augmentation.The model, code and the dataset are available at https://github.com/KupynOrest/DeblurGAN
translated by 谷歌翻译
Thorough testing of safety-critical autonomous systems, such as self-driving cars, autonomous robots, and drones, is essential for detecting potential failures before deployment. One crucial testing stage is model-in-the-loop testing, where the system model is evaluated by executing various scenarios in a simulator. However, the search space of possible parameters defining these test scenarios is vast, and simulating all combinations is computationally infeasible. To address this challenge, we introduce AmbieGen, a search-based test case generation framework for autonomous systems. AmbieGen uses evolutionary search to identify the most critical scenarios for a given system, and has a modular architecture that allows for the addition of new systems under test, algorithms, and search operators. Currently, AmbieGen supports test case generation for autonomous robots and autonomous car lane keeping assist systems. In this paper, we provide a high-level overview of the framework's architecture and demonstrate its practical use cases.
translated by 谷歌翻译
While recent work on text-conditional 3D object generation has shown promising results, the state-of-the-art methods typically require multiple GPU-hours to produce a single sample. This is in stark contrast to state-of-the-art generative image models, which produce samples in a number of seconds or minutes. In this paper, we explore an alternative method for 3D object generation which produces 3D models in only 1-2 minutes on a single GPU. Our method first generates a single synthetic view using a text-to-image diffusion model, and then produces a 3D point cloud using a second diffusion model which conditions on the generated image. While our method still falls short of the state-of-the-art in terms of sample quality, it is one to two orders of magnitude faster to sample from, offering a practical trade-off for some use cases. We release our pre-trained point cloud diffusion models, as well as evaluation code and models, at https://github.com/openai/point-e.
translated by 谷歌翻译
This paper presents an evaluation of the quality of automatically generated reading comprehension questions from Swedish text, using the Quinductor method. This method is a light-weight, data-driven but non-neural method for automatic question generation (QG). The evaluation shows that Quinductor is a viable QG method that can provide a strong baseline for neural-network-based QG methods.
translated by 谷歌翻译
Active learning with strong and weak labelers considers a practical setting where we have access to both costly but accurate strong labelers and inaccurate but cheap predictions provided by weak labelers. We study this problem in the streaming setting, where decisions must be taken \textit{online}. We design a novel algorithmic template, Weak Labeler Active Cover (WL-AC), that is able to robustly leverage the lower quality weak labelers to reduce the query complexity while retaining the desired level of accuracy. Prior active learning algorithms with access to weak labelers learn a difference classifier which predicts where the weak labels differ from strong labelers; this requires the strong assumption of realizability of the difference classifier (Zhang and Chaudhuri,2015). WL-AC bypasses this \textit{realizability} assumption and thus is applicable to many real-world scenarios such as random corrupted weak labels and high dimensional family of difference classifiers (\textit{e.g.,} deep neural nets). Moreover, WL-AC cleverly trades off evaluating the quality with full exploitation of weak labelers, which allows to convert any active learning strategy to one that can leverage weak labelers. We provide an instantiation of this template that achieves the optimal query complexity for any given weak labeler, without knowing its accuracy a-priori. Empirically, we propose an instantiation of the WL-AC template that can be efficiently implemented for large-scale models (\textit{e.g}., deep neural nets) and show its effectiveness on the corrupted-MNIST dataset by significantly reducing the number of labels while keeping the same accuracy as in passive learning.
translated by 谷歌翻译
我们介绍了SubGD,这是一种新颖的几声学习方法,基于最近的发现,即随机梯度下降更新往往生活在低维参数子空间中。在实验和理论分析中,我们表明模型局限于合适的预定义子空间,可以很好地推广用于几次学习。合适的子空间符合给定任务的三个标准:IT(a)允许通过梯度流量减少训练误差,(b)导致模型良好的模型,并且(c)可以通过随机梯度下降来识别。 SUBGD从不同任务的更新说明的自动相关矩阵的特征组合中标识了这些子空间。明确的是,我们可以识别出低维合适的子空间,用于对动态系统的几次学习,而动态系统具有不同的属性,这些属性由分析系统描述的一个或几个参数描述。这种系统在科学和工程领域的现实应用程序中无处不在。我们在实验中证实了SubGD在三个不同的动态系统问题设置上的优势,在样本效率和性能方面,均超过了流行的几次学习方法。
translated by 谷歌翻译
最先进的编码器模型(例如,用于机器翻译(MT)或语音识别(ASR))作为原子单元构造并端到端训练。没有其他模型的任何组件都无法(重新)使用。我们描述了Legonn,这是一种使用解码器模块构建编码器架构的过程,可以在各种MT和ASR任务中重复使用,而无需进行任何微调。为了实现可重复性,每个编码器和解码器模块之间的界面都基于模型设计器预先定义的离散词汇,将其接地到边缘分布序列。我们提出了两种摄入这些边缘的方法。一个是可区分的,可以使整个网络的梯度流动,另一个是梯度分离的。为了使MT任务之间的解码器模块的可移植性用于不同的源语言和其他任务(例如ASR),我们引入了一种模态不可思议的编码器,该模态编码器由长度控制机制组成,以动态调整编码器的输出长度,以匹配预期的输入长度范围的范围预训练的解码器。我们提出了几项实验来证明Legonn模型的有效性:可以重复使用德国英语(DE-EN)MT任务的训练有素的语言解码器模块,而没有对Europarl English ASR和ROMANIAN-ENGLISH进行微调(RO)(RO)(RO)(RO) -en)MT任务以匹配或击败相应的基线模型。当针对数千个更新的目标任务进行微调时,我们的Legonn模型将RO-EN MT任务提高了1.5个BLEU点,并为Europarl ASR任务降低了12.5%的相对减少。此外,为了显示其可扩展性,我们从三个模块中构成了一个legonn ASR模型 - 每个模块都在三个不同数据集的不同端到端训练的模型中学习 - 将降低的减少降低到19.5%。
translated by 谷歌翻译
我们开发了快速算法和可靠软件,以凸出具有Relu激活功能的两层神经网络的凸优化。我们的工作利用了标准的重量罚款训练问题作为一组组-YELL_1 $调查的数据本地模型的凸重新印度,其中局部由多面体锥体约束强制执行。在零规范化的特殊情况下,我们表明此问题完全等同于凸“ Gated Relu”网络的不受约束的优化。对于非零正则化的问题,我们表明凸面式relu模型获得了RELU训练问题的数据依赖性近似范围。为了优化凸的重新制定,我们开发了一种加速的近端梯度方法和实用的增强拉格朗日求解器。我们表明,这些方法比针对非凸问题(例如SGD)和超越商业内部点求解器的标准训练启发式方法要快。在实验上,我们验证了我们的理论结果,探索组-ELL_1 $正则化路径,并对神经网络进行比例凸的优化,以在MNIST和CIFAR-10上进行图像分类。
translated by 谷歌翻译
Mohamed Bin Zayed国际机器人挑战(MBZIRC)2020为无人机(无人机)构成了不同的挑战。我们提供了四个量身定制的无人机,专门为MBZIRC的单独空中机器人任务开发,包括自定义硬件和软件组件。在挑战1中,使用高效率,车载对象检测管道进行目标UAV,以捕获来自目标UAV的球。第二个UAV使用类似的检测方法来查找和流行散落在整个竞技场的气球。对于挑战2,我们展示了一种能够自主空中操作的更大的无人机:从相机图像找到并跟踪砖。随后,将它们接近,挑选,运输并放在墙上。最后,在挑战3中,我们的UAV自动发现使用LIDAR和热敏摄像机的火灾。它用船上灭火器熄灭火灾。虽然每个机器人都具有任务特定的子系统,但所有无人机都依赖于为该特定和未来竞争开发的标准软件堆栈。我们介绍了我们最开源的软件解决方案,包括系统配置,监控,强大无线通信,高级控制和敏捷轨迹生成的工具。为了解决MBZirc 2020任务,我们在多个研究领域提出了机器视觉和轨迹生成的多个研究领域。我们介绍了我们的科学贡献,这些贡献构成了我们的算法和系统的基础,并分析了在阿布扎比的MBZIRC竞赛2020年的结果,我们的系统在大挑战中达到了第二名。此外,我们讨论了我们参与这种复杂的机器人挑战的经验教训。
translated by 谷歌翻译